Abstract:In safety-critical robotics applications, guaranteed and practical uncertainty quantification (UQ) in perception is vital. Many existing works either offer no formal containment guarantee, rely on restrictive modeling assumptions, or focus only on pose estimation rather than a complete SLAM pipeline. This paper presents provably guaranteed UQ algorithms for 3D-3D landmark-based SLAM. The algorithms consist of three basic UQ modules: forward UQ for mapping, backward UQ for pose tracking, and pose compound. Each module produces a certified uncertainty set; when the input uncertainty bounds are deterministic, the output sets inherit deterministic guarantees, i.e., they provably contain the true poses and landmarks. Specifically, we use polytopes to represent uncertainty sets, enabling tractable computations and a unified treatment of pose uncertainty. To enhance algorithms' practical usability, we incorporate conformal prediction to calibrate measurement uncertainty from data with prescribed probability. Simulations and experiments demonstrate that the proposed algorithms provide both strong theoretical guarantees and practical usability. The code is open-sourced at https://github.com/LIAS-CUHKSZ/Polytopic-SLAM-Uncertainty-Quantification.
Abstract:Diversity and multiplexing are the two fundamental gains of multiple-input and multiple-output (MIMO) communications, enabling systems to simultaneously achieve increased reliability and higher data rates. The intricate interplay between these two metrics is captured by the celebrated diversity-multiplexing tradeoff (DMT). With the rapid evolution of wireless technologies, low-latency integrated sensing and communication (ISAC) has emerged as a key enabler for 6G applications, including extended reality (XR) and massive digital twins. Consequently, understanding the DMT within MIMO ISAC systems becomes critical. In this paper, we investigate the communication DMT in a mono-static MIMO ISAC system under Rayleigh fading, specifically when the transmitter is constrained to emit sensing-optimal waveforms. By unveiling the geometric properties of generalized Stiefel manifolds and employing large-deviation analysis, we characterize the asymptotic outage probability of this typical ISAC channel. This formulation yields an elegant converse bound on the sensing-constrained DMT. Ultimately, our work provides an answer to a pivotal unanswered question in ISAC system design: How much MIMO gain is fundamentally sacrificed in communication to integrate optimal sensing capabilities?
Abstract:The rapid progress in radar and communication places increasing demands on low-latency and energy-efficiency array signal processing methods. There is an emerging direction of constructing analog computing processors for directly processing electromagnetic (EM) waves. However, the existing methods are constrained by 2D physical aperture and imprecise design process with inefficient computing architecture, resulting in limited sensing resolution and number of separated sources. Here, we present a fully-analog array signal processor (FASP) using 3D aperture engineering framework to perform super-resolution direction-of-arrival estimation, source number estimation, and multi-channel source separation in parallel for both coherent and incoherent sources. 3D aperture engineering is realized by constructing deep cascaded metasurface layers so that the diffractive propagation from oblique incident fields can be layer-wise modulated and piecewise encoded for perceiving EM fields far exceeding physical aperture limits. The multi-dimensional synthetic aperture (MSA) training is developed to characterize the metasurface modulation and optimize the neuro-augmented physical model for extending system aperture and generating high-order nonlinear angular response. FASP orthogonalizes the array response vectors of communication channels to map them into antenna detectors in the analog domain. The $N$-layer FASP has the capability to achieve ~N times higher angular resolution than the Rayleigh diffraction limit. Experiments further validate the source number estimation and independent channel separation of 10-target that can suppress radar jamming signals by ~20 dB and enhance channel communication capacity by 13.5 times at 36~41 GHz. FASP heralds a paradigm shift in signal processing for super-resolution optics, advanced radar, and 6G communications.
Abstract:Neural combinatorial optimization (NCO) solvers, implemented with graph neural networks (GNNs), have introduced new approaches for solving routing problems. Trained with reinforcement learning (RL), the state-of-the-art graph attention model (GAM) achieves near-optimal solutions without requiring expert knowledge or labeled data. In this work, we generalize the existing graph attention mechanism and propose the extended graph attention model (EGAM). Our model utilizes multi-head dot-product attention to update both node and edge embeddings, addressing the limitations of the conventional GAM, which considers only node features. We employ an autoregressive encoder-decoder architecture and train it with policy gradient algorithms that incorporate a specially designed baseline. Experiments show that EGAM matches or outperforms existing methods across various routing problems. Notably, the proposed model demonstrates exceptional performance on highly constrained problems, highlighting its efficiency in handling complex graph structures.
Abstract:Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation. We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the https://steinate.github.io/logoplanner.github.io.
Abstract:This study aims to simulate real-world clinical scenarios to systematically evaluate the ability of Large Language Models (LLMs) to extract core medical information from patient chief complaints laden with noise and redundancy, and to verify whether they exhibit a functional decline analogous to Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD). We employed a cross-sectional analysis design based on standardized medical probes, selecting four mainstream LLMs as research subjects: GPT-4o, Gemini 2.5, DeepSeek 3.1, and Qwen3-Max. An evaluation system comprising twenty medical probes across five core dimensions was used to simulate a genuine clinical communication environment. All probes had gold-standard answers defined by clinical experts and were assessed via a double-blind, inverse rating scale by two independent clinicians. The results show that all tested models exhibited functional defects to varying degrees, with Qwen3-Max demonstrating the best overall performance and Gemini 2.5 the worst. Under conditions of extreme noise, most models experienced a functional collapse. Notably, GPT-4o made a severe misjudgment in the risk assessment for pulmonary embolism (PE) secondary to deep vein thrombosis (DVT). This research is the first to empirically confirm that LLMs exhibit features resembling metabolic dysfunction when processing clinical information, proposing the innovative concept of "AI-Metabolic Dysfunction-Associated Steatotic Liver Disease (AI-MASLD)". These findings offer a crucial safety warning for the application of Artificial Intelligence (AI) in healthcare, emphasizing that current LLMs must be used as auxiliary tools under human expert supervision, as there remains a significant gap between their theoretical knowledge and practical clinical application.
Abstract:Free play is a fundamental aspect of early childhood education, supporting children's cognitive, social, emotional, and motor development. However, assessing children's development during free play poses significant challenges due to the unstructured and spontaneous nature of the activity. Traditional assessment methods often rely on direct observations by teachers, parents, or researchers, which may fail to capture comprehensive insights from free play and provide timely feedback to educators. This study proposes an innovative approach combining Large Language Models (LLMs) with learning analytics to analyze children's self-narratives of their play experiences. The LLM identifies developmental abilities, while performance scores across different play settings are calculated using learning analytics techniques. We collected 2,224 play narratives from 29 children in a kindergarten, covering four distinct play areas over one semester. According to the evaluation results from eight professionals, the LLM-based approach achieved high accuracy in identifying cognitive, motor, and social abilities, with accuracy exceeding 90% in most domains. Moreover, significant differences in developmental outcomes were observed across play settings, highlighting each area's unique contributions to specific abilities. These findings confirm that the proposed approach is effective in identifying children's development across various free play settings. This study demonstrates the potential of integrating LLMs and learning analytics to provide child-centered insights into developmental trajectories, offering educators valuable data to support personalized learning and enhance early childhood education practices.
Abstract:Although existing 3D perception algorithms have demonstrated significant improvements in performance, their deployment on edge devices continues to encounter critical challenges due to substantial runtime latency. We propose a new benchmark tailored for online evaluation by considering runtime latency. Based on the benchmark, we build a Latency-Aware 3D Streaming Perception (LASP) framework that addresses the latency issue through two primary components: 1) latency-aware history integration, which extends query propagation into a continuous process, ensuring the integration of historical feature regardless of varying latency; 2) latency-aware predictive detection, a module that compensates the detection results with the predicted trajectory and the posterior accessed latency. By incorporating the latency-aware mechanism, our method shows generalization across various latency levels, achieving an online performance that closely aligns with 80\% of its offline evaluation on the Jetson AGX Orin without any acceleration techniques.




Abstract:In this paper, we first present a bias-eliminated weighted (Bias-Eli-W) perspective-n-point (PnP) estimator for stereo visual odometry (VO) with provable consistency. Specifically, leveraging statistical theory, we develop an asymptotically unbiased and $\sqrt {n}$-consistent PnP estimator that accounts for varying 3D triangulation uncertainties, ensuring that the relative pose estimate converges to the ground truth as the number of features increases. Next, on the stereo VO pipeline side, we propose a framework that continuously triangulates contemporary features for tracking new frames, effectively decoupling temporal dependencies between pose and 3D point errors. We integrate the Bias-Eli-W PnP estimator into the proposed stereo VO pipeline, creating a synergistic effect that enhances the suppression of pose estimation errors. We validate the performance of our method on the KITTI and Oxford RobotCar datasets. Experimental results demonstrate that our method: 1) achieves significant improvements in both relative pose error and absolute trajectory error in large-scale environments; 2) provides reliable localization under erratic and unpredictable robot motions. The successful implementation of the Bias-Eli-W PnP in stereo VO indicates the importance of information screening in robotic estimation tasks with high-uncertainty measurements, shedding light on diverse applications where PnP is a key ingredient.




Abstract:Localization and tracking (LocTrack) are fundamental enablers for a wide range of emerging applications. Reconfigurable intelligent surfaces (RISs) have emerged as key components for enhancing the LocTrack performance. This paper investigates a multi-RIS-assisted multi-user (MRMU) LocTrack system, where multiple RISs collaboratively reflect the position-bearing signals for information fusion at the base station, leveraging spatial-temporal correlations in user positions. While studies have shown these correlations improve localization accuracy, their trade-offs with system complexity remain unclear. To address this gap, we characterize the effectiveness of spatial-temporal correlation priors (STPs) utilization in MRMU LocTrack systems using a metric, termed efficiency of correlation (EoC). To further elucidate correlation propagation and RIS interactions, we provide a "correlation information routing" interpretation of EoC through random walk theory. EoC provides a principled performance evaluation metric, that enables system designers to balance localization accuracy enhancement against the increased complexity. Additionally, we investigate the error propagation phenomenon, analyzing its convergence and asymptotic behavior in MRMU LocTrack systems. Finally, we validate the theoretical results through extensive numerical simulations.